Stream VByte: Faster Byte-Oriented Integer Compression

نویسندگان

Daniel Lemire

Nathan Kurz

Christoph Rupp

چکیده

Arrays of integers are often compressed in search engines. Though there are many ways to compress integers, we are interested in the popular byte-oriented integer compression techniques (e.g., VByte or Google’s VARINT-GB). Although not known for their speed, they are appealing due to their simplicity and engineering convenience. Amazon’s VARINT-G8IU is one of the fastest byte-oriented compression technique published so far. It makes judicious use of the powerful single-instruction-multiple-data (SIMD) instructions available in commodity processors. To surpass VARINT-G8IU, we present STREAM VBYTE, a novel byte-oriented compression technique that separates the control stream from the encoded data. Like VARINT-G8IU, STREAM VBYTE is well suited for SIMD instructions. We show that STREAM VBYTE decoding can be up to twice as fast as VARINT-G8IU decoding over real data sets. In this sense, STREAM VBYTE establishes new speed records for byte-oriented integer compression, at times exceeding the speed of the memcpy function. On a 3.4GHz Haswell processor, it decodes more than 4 billion differentially-coded integers per second from RAM to L1 cache.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Vectorized VByte Decoding

We consider the ubiquitous technique of VByte compression, which represents each integer as a variable length sequence of bytes. The low 7 bits of each byte encode a portion of the integer, and the high bit of each byte is reserved as a continuation flag. This flag is set to 1 for all bytes except the last, and the decoding of each integer is complete when a byte with a high bit of 0 is encount...

متن کامل

New adaptive compressors for natural language text

Semistatic byte-oriented word-based compression codes have been shown to be an attractive alternative to compress natural language text databases, because of the combination of speed, effectiveness, and direct searchability they offer. In particular, our recently proposed family of dense compression codes has been shown to be superior to the more traditional byte-oriented word-based Huffman cod...

متن کامل

Upscaledb: Efficient Integer-Key Compression in a Key-Value Store using SIMD Instructions

Compression can sometimes improve performance by making more of the data available to the processors faster. We consider the compression of integer keys in a B+-tree index. For this purpose, systems such as IBM DB2 use variable-byte compression over differentially coded keys. We revisit this problem with various compression alternatives such as Google’s VarIntGB, Binary Packing and Frame-of-Ref...

متن کامل

Boosting Text Compression with Word-Based Statistical Encoding

Semistatic word-based byte-oriented compressors are known to be attractive alternatives to compress natural language texts. With compression ratios around 30-35%, they allow fast direct searching of compressed text. In this article we reveal that these compressors have even more benefits. We show that most of the state-of-the-art compressors benefit from compressing not the original text, but t...

متن کامل

Impact-Based Document Retrieval

Two of the most important aspects contributing to the success of any document retrieval system are the query mechanism and the representation of its auxiliary operational data. The former greatly affects the quality of the retrieval results as well as the speed of the system. The latter reflects the ability of the system to represent its operational data in a compact form that reduces the stora...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Inf. Process. Lett.

دوره 130 شماره

صفحات -

تاریخ انتشار 2018

Stream VByte: Faster Byte-Oriented Integer Compression

نویسندگان

چکیده

منابع مشابه

Vectorized VByte Decoding

New adaptive compressors for natural language text

Upscaledb: Efficient Integer-Key Compression in a Key-Value Store using SIMD Instructions

Boosting Text Compression with Word-Based Statistical Encoding

Impact-Based Document Retrieval

عنوان ژورنال:

اشتراک گذاری